# Multimodal Image Segmentation
SESAME
MIT
SESAME is an open-source multimodal model, fine-tuned on various instruction-based image localization (segmentation) datasets based on the LLaVA model.
Text-to-Image
Transformers

S
tsunghanwu
37
2
Internvl2 5 HiMTok 8B
Apache-2.0
HiMTok is a hierarchical mask token learning framework fine-tuned on the InternVL2_5-8B large multimodal model, focusing on image segmentation tasks.
Image-to-Text
I
yayafengzi
16
3
Segformer B0 Finetuned Food
Apache-2.0
An image segmentation model based on the Transformers library, supporting various image segmentation tasks.
Image Segmentation
Transformers English

S
prem-timsina
20
5
Featured Recommended AI Models